Conceptual Excercises

Applied Exercises

4. Generate a simulated two-class data set with 100 observations and two features in which there is a visible but non-linear separation between the two classes. Show that in this setting, a support vector machine with a polynomial kernel (with degree greater than 1) or a radial kernel will outperform a support vector classifier on the training data. Which technique performs best on the test data? Make plots and report training and test error rates in order to back up your assertions.

set.seed(1)
x<-matrix(rnorm(200*2),ncol=2)
x[1:100,]<-x[1:100,]+2
x[101:150,]=x[101:150,]-2
y<-c(rep(1,150),rep(2,50))
dat<-data.frame(x=x,y=as.factor(y))
plot(x,col=y)

By plotting the results, we can see if the classes are linearly separable. They do not appear so. We will now apply a Support Vector Classifier.

dat<-data.frame(x=x,y=as.factor(y)) # encode response as factor
library(e1071)
svmfit<-svm(y~.,data=dat,kernel="linear",cost=10,scale=FALSE)
plot(svmfit,dat)

Tune to perform cross-validation and store the best model.

set.seed(1)
tune.out<-tune(svm,y~.,data=dat,kernel="linear",ranges=list(cost=c(0.001,0.01,0.1,1,5,10,100)))
bestmod<-tune.out$best.model

Now we can predict the class label on a set of test observations.

xtest<-matrix(rnorm(200*2),ncol=2)
xtest[1:100,]<-x[1:100,]+2
xtest[101:150,]<-x[101:150,]-2
ytest<-sample(c(1,150),20,rep=TRUE)
xtest[ytest==1,]=xtest[ytest==1,]+1
testdat<-data.frame(x=xtest,y=as.factor(ytest))
ypred<-predict(bestmod,testdat)
table(predict=ypred,truth=testdat$y)
##        truth
## predict   1 150
##       1  80 120
##       2   0   0

In this case, 60% of the observations are classified incorrectly.

Moving on to Support Vector Machine. First, split the data into training and test groups and then fit with a radial kernel.

train<-sample(200,100)
svmfit<-svm(y~.,data=dat[train,],kernel="radial",gamma=1,cost=1)
plot(svmfit,dat[train,])

This is cool. It shows an apparent non-linear boundary. Now, let’s tune.

set.seed(1)
tune.out<-tune(svm,y~.,data=dat[train,],kernel="radial",ranges=list(cost=c(0.1,1,10,100,1000),gamma=c(0.5,1,2,3,4)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##     1     1
## 
## - best performance: 0.08 
## 
## - Detailed performance results:
##     cost gamma error dispersion
## 1  1e-01   0.5  0.23 0.10593499
## 2  1e+00   0.5  0.09 0.09944289
## 3  1e+01   0.5  0.09 0.09944289
## 4  1e+02   0.5  0.11 0.11005049
## 5  1e+03   0.5  0.10 0.10540926
## 6  1e-01   1.0  0.23 0.10593499
## 7  1e+00   1.0  0.08 0.09189366
## 8  1e+01   1.0  0.11 0.11005049
## 9  1e+02   1.0  0.12 0.10327956
## 10 1e+03   1.0  0.13 0.11595018
## 11 1e-01   2.0  0.23 0.10593499
## 12 1e+00   2.0  0.08 0.09189366
## 13 1e+01   2.0  0.11 0.11005049
## 14 1e+02   2.0  0.13 0.11595018
## 15 1e+03   2.0  0.16 0.10749677
## 16 1e-01   3.0  0.23 0.10593499
## 17 1e+00   3.0  0.08 0.09189366
## 18 1e+01   3.0  0.12 0.12292726
## 19 1e+02   3.0  0.14 0.09660918
## 20 1e+03   3.0  0.15 0.10801234
## 21 1e-01   4.0  0.23 0.10593499
## 22 1e+00   4.0  0.08 0.09189366
## 23 1e+01   4.0  0.11 0.11972190
## 24 1e+02   4.0  0.15 0.10801234
## 25 1e+03   4.0  0.15 0.10801234

It appears the best cost is 1 and the best gamma is 0.5. Now time to predict!

table(true=dat[-train,"y"],pred=predict(tune.out$best.model,newx=dat[-train,]))
##     pred
## true  1  2
##    1 57 16
##    2 20  7

In this case, 42% of the test observations are misclassified. The non-linear decision boundary from the SVM outperformed the SVC with the linear boundary, which is expected because the data itself is non-linear.